首页> 外文OA文献 >Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler
【2h】

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

机译:polya Urn Latent Dirichlet allocation:一个双重稀疏的大规模   并行采样器

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Latent Dirichlet Allocation (LDA) is a topic model widely used in naturallanguage processing and machine learning. Most approaches to training the modelrely on iterative algorithms, which makes it difficult to run LDA on big datasets that are best analyzed in parallel and distributed computationalenvironments. Indeed, current approaches to parallel inference either don'tconverge to the correct posterior or require storage of large dense matrices inmemory. We present a novel sampler that overcomes both problems, and we showthat this sampler is faster, both empirically and theoretically, than previousGibbs samplers for LDA. We do so by employing a novel P\'{o}lya-Urn-basedapproximation in the sparse partially collapsed sampler for LDA. We prove thatthe approximation error vanishes with data size, making our algorithmasymptotically exact, a property of importance for large-scale topic models. Inaddition, we show, via an explicit example, that -- contrary to popular beliefin the topic modeling literature -- partially collapsed samplers can be moreefficient than fully collapsed samplers. We conclude by comparing theperformance of our algorithm with that of other approaches on well-knowncorpora.
机译:潜在狄利克雷分配(LDA)是广泛用于自然语言处理和机器学习的主题模型。大多数用于训练模型的方法都是基于迭代算法,这使得很难在并行和分布式计算环境中得到最佳分析的大型数据集上运行LDA。确实,当前的并行推理方法要么没有收敛到正确的后验,要么需要存储大量的密集矩阵内存。我们提出了一种新颖的采样器,可以克服这两个问题,并且从经验和理论上证明,该采样器比以前的LDA吉布斯采样器要快。我们通过在LDA的稀疏部分折叠的采样器中采用新颖的基于P \'{o} lya-Urn的逼近来实现。我们证明了逼近误差随着数据大小的消失而消失,这使得我们的算法渐近精确,这对于大规模主题模型而言非常重要。此外,我们通过一个明确的例子表明-与该主题建模文献中的流行观点相反,部分折叠的采样器比完全折叠的采样器效率更高。通过比较我们的算法与知名语料库上其他方法的性能来得出结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号